Re: [hybi] [permessage-deflate] Compressing fragmented data

"Arman Djusupov" <arman@noemax.com> Tue, 21 January 2014 12:44 UTC

Return-Path: <arman@noemax.com>
X-Original-To: hybi@ietfa.amsl.com
Delivered-To: hybi@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4BA3D1A00D6 for <hybi@ietfa.amsl.com>; Tue, 21 Jan 2014 04:44:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.636
X-Spam-Level:
X-Spam-Status: No, score=-0.636 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.535, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8srLpSscanDn for <hybi@ietfa.amsl.com>; Tue, 21 Jan 2014 04:44:11 -0800 (PST)
Received: from mail.noemax.com (mail.noemax.com [74.208.113.37]) by ietfa.amsl.com (Postfix) with ESMTP id 07FC41A00C7 for <hybi@ietf.org>; Tue, 21 Jan 2014 04:44:10 -0800 (PST)
DKIM-Signature: a=rsa-sha1; t=1390308205; x=1390913005; s=m2048; d=noemax.com; c=relaxed/relaxed; v=1; bh=oLWXSb+JWWwwhH1aG7lyABkDmXg=; h=From:Subject:Date:Message-ID:To:Cc:MIME-Version:Content-Type:In-Reply-To:References; b=KMyIbtuptXK+fkN30YXmhCzVu7jyQBkttRBkKMMC5fyKKc7bYLpoNOHA5gSNCzL5ZnPBPge60XBwfwZ5r9zvQfsXJPZDk+9kACE9hnJEznlSsRH3UC2jO+vujuOXQwPM1ZVYmX+IAIqMjdnjkx1Jgbp64Ucyp8u9mfAiSeWvfaqwJ4xJdY/MRVWgR7H46Aad1qOcaZI0wc4a/MH8MWBeKstu5RN+YchJRMbc5w6RSs/9M8wWTDB0fj/jql5mgEBQ0iyfJtAgRl7sDUhSS34svr4Uuruj/NfJh3OwK0/rg+KnIOwBd+umhqLe94JhUXwpbFa5CAYfVafliP0jk3YU2g==
Received: from mail.noemax.com by mail.noemax.com (Noemax Mail Server) with ASMTP (SSL) id 201401211443237331; Tue, 21 Jan 2014 14:43:23 +0200
From: Arman Djusupov <arman@noemax.com>
To: 'Takeshi Yoshino' <tyoshino@google.com>
References: <CAH9hSJbeY5VOY_iuwrdBq-KYcoVkW_8KArPp70hP4tdZj6eQfg@mail.gmail.com>
In-Reply-To: <CAH9hSJbeY5VOY_iuwrdBq-KYcoVkW_8KArPp70hP4tdZj6eQfg@mail.gmail.com>
Date: Tue, 21 Jan 2014 14:44:05 +0200
Message-ID: <006501cf16a6$7a94d470$6fbe7d50$@noemax.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0066_01CF16B7.3E201570"
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQG5HshGVuJS4QKFNVB3Zyo3aQvC9Zq7DuAg
Content-Language: en-us
Cc: hybi@ietf.org
Subject: Re: [hybi] [permessage-deflate] Compressing fragmented data
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi/>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Jan 2014 12:44:13 -0000

Hello Takeshi,

 

If the empty-block-in-final-frame solution is the preferred one then I think it would be useful to add an example of such an empty block for the cases when there is no compressed data to be sent in the final frame due to the compression stream having been already flushed and byte aligned. DEFLATE implementations like Zlib do not normally produce an empty block encoding when there is no data to flush, so developers will need to have some understanding of the DEFLATE format in order to produce this block artificially. This can result in mistakes. Since it is rare to encounter cases with data aligned in such a manner that there is nothing to send in the final frame, such mistakes might not be detected during development/testing but might leak in production and result in the occasional communication failures. So IMHO adding an example would be useful.

 

Thanks!

 

With best regards,

Arman

 

From: Takeshi Yoshino [mailto:tyoshino@google.com] 
Sent: Tuesday, January 14, 2014 6:05 AM
To: Arman Djusupov
Cc: hybi@ietf.org
Subject: [permessage-deflate] Compressing fragmented data

 

Hi Arman,

 

Let me change the subject.

 

On Thu, Jan 9, 2014 at 11:46 PM, Arman Djusupov <arman@noemax.com> wrote:

Hello Takeshi,

 

In the current draft, the description of the compression process states :

 
   1.  Compress all the octets of the payload of the message using
       DEFLATE.

 

This doesn’t take into account outbound messages of arbitrary size that NEED to be fragmented, for which it is more favorable to use fragmentation and to compress & send fragments until the end of the message is reached.

 

OK. I'll add some text to explain how this works for fragmented messages.

 

 

In such cases the implementation buffers the input from its source and then flushes it out into a compressed fragment, repeating this process until the end of the source data is reached. At that point it flushes any remaining bytes buffered into a final frame, or flushes a final frame of 0 length.

 

 

When the implementation produces compressed fragments, it periodically produces frames with 0x00 0x00 0xFF 0xFF at the end of the frame due to flushing DEFLATE. But because of the following requirements:

 

   2. If the resulting data does not end with an empty DEFLATE block

       with no compression (the "BTYPE" bits is set to 00), append an

       empty DEFLATE block with no compression to the tail end.

 

   3.  Remove 4 octets (that are 0x00 0x00 0xff 0xff) from the tail end.

       After this step, the last octet of the compressed data contains

       (possibly part of) the DEFLATE header bits with the "BTYPE" bits

       set to 00.

 

the implementation must ensure that the message ends empty blocks without a trailing 0x0000FFFF, so the implementation must keep track on what type of block was at the end of the frame that was sent last. If the last frame sent ends with 0x0000FFFF then the implementation cannot remove those 4 bytes from the wire, but it must artificially produce an empty DEFLATE block to send as the final frame. This is only due to the requirement to remove 0x0000FFFF.

 

Yes. It's just two byte long. DEFLATE block header (3bit) + Fixed Huffman code for end of block symbol (7bit) + padding (6bit).

 

 

 

Wouldn’t it be easier to make the removal 0x00 0x00 0xFF 0xFF optional, while at the same time requiring that the receiving side appends a 0x00 0x00 0xFF 0xFF to the final frame in case when its missing?

 

 

I think it complicates decoders. Code which doesn't know DEFLATE decompressor's state cannot check if 0x00 0x00 0xFF 0xFF at the end is really an uncompressed block body or not. It could be something else, e.g. body of compressed block.

 

It is much simpler to just check the final bytes of the message instead of having to remember the state of the previous frame.